Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction
نویسنده
چکیده
A seed-based framework for textual information extraction allows for weakly supervised acquisition of open-domain class attributes over conceptual hierarchies, from a combination of Web documents and query logs. Automaticallyextracted labeled classes, consisting of a label (e.g., painkillers) and an associated set of instances (e.g., vicodin, oxycontin), are linked under existing conceptual hierarchies (e.g., brain disorders and skin diseases are linked under the concepts BrainDisorder and SkinDisease respectively). Attributes extracted for the labeled classes are propagated upwards in the hierarchy, to determine the attributes of hierarchy concepts (e.g., Disease) from the attributes of their subconcepts (e.g., BrainDisorder and SkinDisease).
منابع مشابه
Queries as a Source of Lexicalized Commonsense Knowledge
The role of Web search queries has been demonstrated in the extraction of attributes of instances and classes, or of sets of related instances and their class labels. This paper explores the acquisition of opendomain commonsense knowledge, usually available as factual knowledge, from Web search queries. Similarly to previous work in open-domain information extraction, knowledge extracted from t...
متن کاملLow-Cost Supervision for Multiple-Source Attribute Extraction
Previous studies on extracting class attributes from unstructured text consider either Web documents or query logs as the source of textual data. Web search queries have been shown to yield attributes of higher quality. However, since many relevant attributes found in Web documents occur infrequently in query logs, Web documents remain an important source for extraction. In this paper, we intro...
متن کاملLightly-Supervised Attribute Extraction
Web search engines can greatly benefit from knowledge about attributes of entities present in search queries. In this paper, we introduce lightly-supervised methods for extracting entity attributes from natural language text. Using these methods, we are able to extract large numbers of attributes of different entities at fairly high precision from a large natural language corpus. We compare our...
متن کاملBiperpedia: An Ontology for Search Applications
Search engines make significant efforts to recognize queries that can be answered by structured data and invest heavily in creating and maintaining high-precision databases. While these databases have a relatively wide coverage of entities, the number of attributes they model (e.g., GDP, CAPITAL, ANTHEM) is relatively small. Extending the number of attributes known to the search engine can enab...
متن کاملLife-iNet: A Structured Network-Based Knowledge Exploration and Analytics System for Life Sciences
Search engines running on scientific literature have been widely used by life scientists to find publications related to their research. However, existing search engines in the life-science domain, such as PubMed, have limitations when applied to exploring and analyzing factual knowledge (e.g., disease-gene associations) in massive text corpora. These limitations are mainly due to the problems ...
متن کامل